An objective measure for estimating MOS of synthesized speech
نویسندگان
چکیده
This paper proposes an average concatenative cost function as the objective measure for naturalness of synthesized speech. All its seven component-costs can be derived directly from the input text and the scripts of speech database. A formal Mean Opinion Score (MOS) experiment shows that the average concatenative cost and its seven components are all highly correlated with MOS obtained subjectively. The correlation coefficient between the objective measure and subjective measure is –0.872. The mean of errors in MOS estimation for individual waveforms is 0.32 with 0.40 RMSE. When estimating the overall MOS for TTS systems, the mean error is smaller than 0.05. With the proposed objective measure, it becomes possible and easy for us to track the performance in naturalness regularly. The proposed cost function could also serve as criteria for optimizing the algorithms for unit selecting and speech database pruning.
منابع مشابه
An Evaluation of Synthetic Speech Using the PESQ Measure
The paper presents experiments on the use of the perceptual objective measure – ITU-T Rec. P.862 Perceptual Evaluation of Speech Quality (PESQ), for the automatic evaluation of synthetic speech. The approach is based on the evaluation of the statistically significant correlation between the outputs of subjective and objective tests. We propose the following technique to evaluate the usage of th...
متن کاملObjective Quality Assessment of Wideband Speech Coding using W-PESQ Measure and Artificial Voice
An objective quality measurement methodology for wideband-speech coding has been studied, its essential components being an objective quality measure and an input test signal. Wideband-PESQ conforming to draft Recommendation P.862 has been studied as the objective quality measure. The Wideband-PESQ has been verified from the viewpoint of the consistency between subjectively evaluated MOS and ob...
متن کاملImprovement of MBSD by scaling noise masking threshold and correlation analysis with MOS difference instead of MOS
The Modified Bark Spectral Distortion (MBSD), used for an objective speech quality measure, was presented previously [1][2]. The MBSD measure estimates speech distortion in the loudness domain taking into account the noise masking threshold in order to include only audible distortions in the calculation of the distortion measure. Preliminary simulation results have shown improvement of the MBSD...
متن کاملIncorporation of temporal masking effects into bark spectral distortion measure
The objective of this paper is to extend a promising objective speech distortion measurement method, the Bark Spectral Distance (BSD) measure, with the auditory concepts of forward and backward temporal masking to improve its measurement accuracy. The results of this investigation show that automatic BSD-based speech quality ratings may be made to correlate better with existing MOS ratings by r...
متن کاملImprovement of prosodic characteristic in Vietnamese speech synthesis system base on HMM
The key factors helping people to understand the synthesized voices of text-to-speech system are the naturalness and the intelligibility. However, making more natural voices remains a difficult task because of the speech data’s scarcity. With data limited corpus, prosodic information such as tone, intonation, Part-of-Speech is added to ensure the quality of synthetic speech. In the paper, we in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001